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Data processing system capable of performing vector/matrix processing and arithmetic processing 
unit incorporated therein. 



@ For speed-up of an arithmetic operation on vec- 
tors, matrices or a vector and a matrix, an arithmetic 
processing unit (2) provided in association with a 
central processing unit has a program memory (8) 
for storing a microprogram corresponding to a 
macro-Instruction code representative of the 
arithmetic operation and fed from the central pro- 
cessing unit, and operands codes are transferred 
from an internal resistor array (16) to operand re- 
gisters (RG1 to RQ4) assigned to an augend and an 
addend or a multiplicand and a multiplier for calcula- 
tion carried out by an arithmetic and logic unit (18), 
wherein the operand codes for the arithmetic opera- 
tion are successively transferred to the internal resis- 

3 tor array prior to the execution of the micro-instruc- 
tion codes, so that the arithmetic processing unit 
^ completes the task without any interrupt for receiving 
operands. 
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DATA PROCESSING SYSTEM CAPABLE OF PERFORMING VECTOR/MATRIX PROCESSING AND 
ARITHMETIC PROCESSING UNIT INCORPORATED THEREIN 



FIELD OF THE INVENTION 



This invention relates to a data processing sys- 
tem and. more particularly, to a slave arittimetlc 
processing unit associated with a central process- 
ing unit incorporated in the data processing system 
for carrying out various vector processing. 

DESCRIPTION OF THE RELATED ART 



A prior art slave arithmetic processing unit of 
the data processing system is responsive to scalar 
arithmetic instructions for the four fundamental 
arithmetic operations and transcendental integral 
functions, however, no instruction set is provided 
therein for various vector operations or matrices 
where the data processing system fetches more 
than three operands. If an user needs to carry out a 
vector calculation such as a vector addition or a 
matrix calculation with the prior art data processing 
system, the vectors or the matrices are loaded into 
a memory space or interna! registers of the pro- 
cessing unit assigned to the user, and the process- 
ing unit repeats executions of an adding and mul- 
tiplying instructions of the assembly instruction set. 

For example, let us consider a prior art data 
processing system where a central processing unit 
without any capability of the arithmetic operations 
and a slave arithmetic processing unit associated 
with the central processing unit are incorporated. 

The slave processor fetches instruction codes 
from the central processing unit and, then, ex- 
ecutes the instruction instead of the central pro- 
cessing unit. The slave processing unit thus asso- 
ciated with the central processing unit \ooks Wke 
imparting the capability thereof to the central pro- 
cessing unit, and, for this reason, the slave pro- 
cessing unit is called as "co-proc essing unit" in 
terms of the central processing unit 

The executive sequence of the co-processing 
unit is. by way of example, illustrated in Fig. 1 of 
the drawings. The central processing unit transfers 
an arithmetic instruction code for requesting an 
assistance to the co-processing unit at time t1. 
With the arithmetic instruction, the co-processing 
unit shifts a busy signal to an active low level 
representative of the necessity of the waiting s^^ 
of the central processing unit. While the busy sig- 
nal remains in the active low level, the central 
processing unit is established into the waiting sta- 
tus. The co-processing unit decodes the arithmetic 



instruction code fed from the central processing 
unit at time t2. and starts on the execution of the 
arithmetic instruction at time t3. Upon completion 
of the arithmetic operation, the co-processing unit 
5 allows the busy signal to be recovered to an inac- 
tive high level- at time t4, and, then, the central 
processing unit reads out the status from the co- 
processing unit. The central processing unit analy- 
ses the status fed from the co-processing unit at 

10 time t5, and proceeds to a new task if no exception 
takes place. Thus, the central processing unit 
needs to communicate with the co-processing unit 
over four clocks, i.e.. the instmctlon code transfer, 
the instruction code decoding, the status transfer 

15 and the status analysis. If the data processing 
system is requested to produce a product array (x , 
y , 2') from a 3 x 3 matrix (a. b, c, d. e. f. h, i) and 
a vector with three elements (x, y, z), the central 
processing unit needs to instiruct the co-processing 

20 unit on nine multiplying operations and six adding 
operations as shown in Fig. 2. Since each adding 
operation and each multiplying operation corre- 
spond to a single adding instnjction code and a 
single multiplying instruction code, respectively, fif- 

25 teen arithmetic instruction codes are repeatedly 
supplied from the central processing unit to tiie co- 
processing unit for completion of the given calcula- 
tion. Assuming now that the co-processing unit 
consumes fifteen clocks for each execution of the 

30 multiplying instruction code and twelve clocks for 
each execution of the adding instruction code 
(each including four clocks for the communication 
described hereinbefore), the data processing unit 
consumes two-hundreds and seven clocks for the 

35 given task. 

Thus, the prior art data processing system 
consumes a large amount of time period for the 
vector operation and the matrix operation due to 
the repetition of the adding and multiplying oper- 

40 ations as well as the communication between the 
central processing unit and the co-processing unit. 

SUMMARY OF THE INVENTION 

45 

It is therefore an important object of the 
present invention to provide a data processing sys- 
tem which executes vector and matrix operations in 
so a relatively sfnall^mdurit of time perlod7 

It is also an important object of the present 
invention to provide a co-processing unit which 
carries out arithmetic operations on vectors and 
matirices in response to vector and matrix 



arithmetic instructions. 

To accomplish these objects, the present in- 
vention proposes to provide microprograms repre- 
sentative of arithmetic operations on vectors, a 
vector and matrix and matrices in a program mem- 
ory incorporated in an arithmetic processing unit. 

In accordance with one aspect of the present 
invention, there is provided an arithmetic process- 
ing unit provided in association with a central pro- 
cessing unit having a plurality of instruction codes 
including a plurality of macro-instruction codes re- 
spectively representative of arithmetic operations 
on scalar numbers, on a vector and a matrix, on a 
plurality of vectors and on a plurality of matrices, 
comprising: a) a program memory unit storing a 
plurality of microprograms including microprograms 
corresponding to the macro-instructions represen- 
tative of the arithmetic operations; b) an instruction 
decoder unit supplied with one of the macro-in- 
struction codes and producing a decoded signal 
indicative of a starting address of one of the micro- 
programs corresponding to aforesaid one of the 
macro-instructions for successively reading out a 
micro-instruction code sequence from the program 
memory; c) a controlling unit responsive to the 
micro-instruction code sequence and operative to 
produce a plurality of controlling signals and to 
shift a busy signal between an active level and an 
inactive level so as to cause the central processing 
unit to enter a waiting state and to be recovered 
therefrom; d) an internal resistor array having a 
plurality of registers for memorizing a plurality of 
operand codes, sums, products and product-sums; 
e) a resultant resistor for storing one of the sum of 
the operand codes, one of the product of the 
operand codes and one of the product-sums; f) a 
plurality of operand registers partially assigned to 
the operand codes serving as an augend and an 
addend and partially assigned to the operand 
codes serving as a multiplicand and a multiplier; g) 
an arithmetic and logic unit responsive to the con- 
trolling signals and operative to perform at least 
arithmetic operations on the operand codes in the 
operand registers for producing one of the sums, 
one of the products and one of the product-sums in 
the resultant resistor; and h) a data input-and- 
output port communicable with external units in- 
cluding the central processing unit for receiving the 
operand codes an for transferring one of the sums, 
one of the products and the product-sums. 

In accordance with another aspect of the 
present invention, there is provided a data process- 
ing system comprising a) a cen tral processing unit 
having a plurality of instruction codes including a 
Pj^'^s!!^ of macro-instruction c ode s respectively re- 
presentative of arithmetic operations on scalar 
numbers, on a vector and a matrix, on a plurality of 
vectors and on a plurality of matrices, the central 



processing unit further having a plurality of operand 
codes and macro-instructions for requesting a suc- 
cessive receiving operation on the operands and 
successive transferring operation on at least 

6 product-sums; and b) an arithmetic processing unit 
comprising t)-1) a program memory unit storing a 
plurality of microprograms including microprograms 
corresponding to the macro-instructions represen- 
tative of the arithmetic operations and micropro- 

10 grams for the successive receiving operation and 
the successive transferring operation, b-2) an in- 
struction decoder unit supplied with one of the 
macro-instruction codes and producing a decoded 
signal indicative of a starting address of one of the 

75 microprograms corresponding to aforesaid one of 
the macro-instructions for successively reading out 
a micro-instruction code sequence from the pro- 
gram memory, b-3) a controlling unit responsive to 
the micro-instruction code sequence and operative 

20 to produce a plurality of controlling signals and to 
shift a busy signal between an active level and an 
inactive level so as to cause the central processing 
unit to enter a waiting state and to be recovered 
therefrom, t>4) an internal resistor array having a 

25 plurality of registers for memorizing the operand 
codes, sums, products and the product-sums, b-5) 
a resultant resistor for storing one of the sums of 
the operand codes, one of the products of the 
operand codes and one of the product-sums, b-6) a 

30 plurality of operand registers partially assigned to 
the operand codes serving as an augend and an 
addend and partially assigned to the operand 
codes serving as a multiplicand and a multiplier, b- 
7) an arithmetic and logic unit responsive to the 

35 controlling signals and operative to perform at least 
arithmetic operations on the operand codes in the 
operand registers for producing aforesaid one of 
the sums, aforesaid one of the products and afore- 
said one of the product-sums in the resultant resis- 

40 tor, and b-8) a data input-and-output port respon- 
sive to the controlling signals and communicable 
with the central processing unit for successively 
receiving the operand codes and for successively 
transferring the product-sums, the data input-and- 

46 output port further operative to transfer one of the 
sums and one of the products. 

BRIEF DESCRIPTION OF THE DRAWtNGS 

50 

The features and advantages of a data pro- 
cessing system and a co-processing unit according 
to the present invention will be more clearly under- 
55 stood from the following description talcen in con- 
junction with the accompanying drawTngs in whichr 
Fig. 1 is a timing chart showing the behavior 
of a prior art data processing system; 
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Fig. 2 is a view showing a multiplication of a 
matrix and a vector calculated by the prior art data 
processing unit; 

Fig. 3 is a block diagram showing the ar- 
rangement of an arithmetic processing unit em- 5 
bodying the present invention; 

Fig. 4 is a block diagram showing the ar- 
rangement of a data processing system fabricated 
by using the arithmetic processing unit shown in 
Fig. 3; ;o 

Fig. 5 is a flow chart showing a nnicro-in- 
struction code sequence executed by the 
arithmetic (Drocessing unit shown in Fig. 3; 

Rg. 6 is a flow chart showing another micro- 
instruction code sequence executed by the is 
arithmetic processing unit shown in Fig. 3; 

Rg. 7 is a flow chart showing still another 
micro-instruction code sequence for the 
vector/matrix operation executed by the arithmetic 
processing unit shown in Fig. 3; 2o 

Rgs. 8A and 8B are timing charts showing 
the sequence of vector/matrix operation achieved 
by the data processing system according to the 
present invention; 

Fig. 9 is a block diagram showing the circuit 2S 
arrangement of an essential part of the arithetic 
and logic unit incorporated in the arithmetic pro- 
cessing unit shown in Rg. 3; 

Fig. 10 is a block diagram showing the cir- 
cuit anrangement of another essential part of the 30 
arithmetic and logic unit incorporated in the 
arithmetic processing unit shown in Fig, 3; 

Rg, 11 is a view showing the relationship 
between the bit string in the register 1 42 shown in 
Fig. 9 and the data bits fed to the shifters 148 and 35 
1 50 and the subtracter 1 56; 

Rg. 12 is a view showing \he operand di- 
vided into a plurality of sections each overlapped 
with the adjacent sections by one bit in accordance 
with the Booth's multiplication algorithm; and 4o 

Rg. 13 is a view showing the relationship 
between the bit pattern of each section and refer- 
ence numeral designating the selected register. 

46 

DESCRIPTION OF THE PREFERRED EMBODI- 
MENT 



Referring first to Fig, 3 of the drawings, an so 
arithmetic processing unit 2 embodying the present 
invention comprises a command port 4 for a 
macro-instruction code fed from a central process- 
ing unit 32 (shown in Fig ._ 4). and the macro- 

instruction code is transferred from the command 56 
port 4 to an Instruction decoder unit 8. In a read 
only memory unit 8 are stored a plurality micropro- 
grams each read out therefrom in response to a 



decoded signal DS produced by the instruction 
decoder unit 6 for providing a micro-Instruction 
stream to a controlling unit 10. and the controlling 
unit 10 produces various controlling signals 12 for 
achievement of each micro-instruction through 
selective activations of component units and cir- 
cuits. The controlling unit 10 further produces a 
status code ST which is supplied from a status port 
14 to the central processing unit 32 for reporting 
the actual status upon completion of a given task. 
The control ling unit 10 also produces a busy 
signal BUSY of an active low level which is fed to 
the central processing unit 32 for the sake of 
establishment of a waiting state. The micropro- 
grams include vector and matrix arithmetic micro- 
programs as well as ordinary arithmetic micropro- 
grams for the four fundamental arithmetic oper- 
ations, and, for this reason, the central processing 
unit 32 merely provides macro-instruction codes 
respectively representative of vector and matrix 
arithmetic operations. 

The arithmetic processing unit 2 further com- 
prises a resistor array 16. an arithmetic and logic 
unit 18 capable of executing at least an adding 
operation and a multiplying operation, four operand 
registers RG1, RG2. RG3 and RG4, and a resultant 
resistor RGS. The resistor array 16 is commu- 
nicable with the operand registers RG1 to RG4 
through an internal bus system 20, and the resul- 
tant resistor RGS is also communicable with the 
resistor array 16 through the intemal bus system 
20. The resistor array 16 is constituted by a large 
number of registers including registers labeled with 
a to i. X. y, z, x^, y^ and z_. The intemal bus system 
20 is' filrther coupled in parallel to a data input-and- 
output port 22 and an address port 24, and the 
arithmetic processing unit 2 is communicable with 
external devices through these ports 22 and 24. 

The arithmetic processing unit 2 thus arranged 
forms a part of a data processing system together 
with the central processing unit 32. and serves as a 
co-processor for the central processing unit 32. As 
shown in Rg. 4 of the drawings, the central pro- 
cessing unit 32 and the airithmetic processing unit 
2 are accessible to a main memory unit 34 through 
an external bus system provided with an external 
data bus 36 and an external address bus 38. The 
busy signal line BUSY ' is coupled to a control 
signal port of the central processing unit 32. 

The behavior of the data processing system 
thus fabricated is described hereinbelow. Rrst, as- 
suming now that the data processing system is 
requested to carry out an adding operation on data 
codes a and b to produce, the sum. _x. the central 
processing unit 32 supplies a macro-instruction 
code representative of the ordinary adding opera- 
tion to the command port 4 of the arithmetic pro- 
cessing unit 2, and, then, the macro-instruction 



4 



code is transferred to the instruction decoder unit 6 
for decoding. The macro-instruction code Is de- 
coded by the decoder unit 6, and the decoded 
signal DS specifies the starting address of the 
microprogrann for the ordinary adding operation. 
Then, a series of micro-instruction codes are se- 
quentiatly read out from the read only memory unit 
8. 

In detail, the micro-instruction code sequence 
allows the controlling unit 10 to shift the busy 
signal BUSY into the active low level, and the busy 
signal is supplied to the controlling port of the 
central processing unit 32 through the busy signal 
line BUSY. With the busy signal, the central pro- 
cessing unit 32 enters the waiting state until the 
recovery of the busy signal. 

Subsequently, the central processing unit 32 
provides the augend a and the addend b to the 
arithmetic processing unit 2. The augend a""and the 
addend b are transferred from the data input-and- 
output port 22 through the internal bus system 20 
to the resistor array 1 6, and are memorized into the 
registers a and b, respectively. When the augend 
and the addend" are memorized in the intemal 
resistor array 16, the augend a is supplied to the 
operand resistor RGl as the first operand 0P1A as 
by step A of Fig. 5. and the addend b is supplied 
to the operand resistor RG2 as the second operand 
as by step B of Fig. 5. When the two operands are 
thus stored in the operand registers RGl and RQ2, 
respectively, the arithmetic and logic unit 18 adds 
the addend b to the augend a as by steps C to G 
of Fig. 5 to produce the sum x in the resultant 
resistor RG5, The sum x is transferred to the 
resistor x of the intemal resistor array 16 as by 
step H of Fig 5. The micro-instruction code se- 
quence causes the busy signal BUSY to be recov- 
ered to the inactive high level, so that the central 
processing unit 32 fetches the status code indica- 
tive of any exception. If no exception takes place in 
the ordinary adding operation, the sum x is trans- 
ferred to the central processing unit, and the data 
processing system confirms the completion of the 
given task. As will be seen from Fig. 5, the 
arithmetic processing unit consumes eight clocks 
for completion of the ordinary adding operation. 

Next, description is made for an ordinary mul- 
tiplying operation carried out by the data process- . 
ing system with reference to Rg. 6. The central 
processing unit 32 supplies a macro-instruction 
code representative of the ordinary multiplying op- 
eration to the command port 4 of the arithmetic 
processing unit 2. and. then, the macro-instruction 
code is transferred to the instruction decoder unit 6 
-for decoding. The macro-instruction code is de- 
coded by the decoder unit 6, and the decoded 
signal DS specifies the starting address of the 
microprogram for the ordinary multiplying opera- 
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tion. Then, a series of micro-instruction codes are 
sequentially read out from the read only memory 
unit 8. 

In detail, the micro-instruction code sequence 

5 first allows the controlling unit 10 to shift the busy 
signal BUSY into the active low level, and the busy 
signal is supplied to the controlling port of the 
central processing unit 32 through the busy signal 
line BUSY. With the busy signal, the central pro- 

10 cessing unit 32 enters the waiting state until the 
recovery of the busy signal. 

Subsequently, the central processing unit 32 
provides the multiplicand a and the multiplier b to 
the arithmetic processing unit 2. The multiplicand a 

75 and the multiplier b are transferred from the data 
input-and-output port 22 through the internal bus 
system 20 to the resistor array 16, and are memo- 
rized into the registers a and b respectively. When 
the multiplicand and the multiplier are memorized 

20 in the internal resistor array 16, the multiplicand a 
is transfenred to the operand resistor RG3 as by 
step A of Fig. 6, and the multiplier b is transferred 
to the operand resistor RG4 as by step 8 of Rg. 6. 
If the two operands 0P1M and 0P2M are thus 

25 Stored in the operand registers RG3 and RG4. 
respectively, the arithmetic and logic unit 18 mul- 
tiplies a by b as by steps C to J of Fig. 6 to 
produce the product x. The product x is transferred 
from the resultant resistor RG5 to the resistor x of 

30 the internal resistor array 16. The micro-instruction 
code sequence then instructs the controlling unit 
10 to recover the busy signal BUSY to the inactive 
high level which is reported to the controlling port 
of the central processing unit 32, Then, the central 

36 processing unit 32 requests the status code to the 
arithmetic processing unit 2. and the status code is 
analyzed by the central processing unit to see 
whether or not any exception takes place during 
the ordinary muttiplying operation. If no exception 

40 takes place, the product x is transferred to the 
central processing unit 32. and the data processing 
system confinms the completion of the given task. 
In the ordinary muftipiying operation, the arithmetic 
processing unit 2 consumes eleven clocks as will 

46 be seen from Fig- 6. 

Rnally, description is made for a vector/matrix 
arithmetic operation with reference to Rg. 7 of the 
drawings. The vector/matrix arithmetic operation is 
canied out for the equa tion shown in Fig. 2, and 

50 the same sequence^ is repeated for producing the 
product array (x'. y'. z ). and, for this reason, the 
description is focused upon the product x for the 
sake of simplicity. 

All of the operand codes a to i and x to z are 

55 assumed to be nnemorized in"the jnternal ^_r^^^^^ 
array 1. The central processing unit 32 provides a 
macro-instruction representative of a vector/matrix 
arithmetic operation to the command port 4 of the 
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arithmetic processing unit 2 as similar to the prior 
art data processing system, and the macro-instruc- 
tion is transferred from the command port 4 to the 
instruction decoder unit 6, and the instruction de- 
coder unit 6 produces the decoded signal DS re- s 
presentative of the starting address of a micro- 
program on the basis of the macro-instruction. 
When the starting address is specified, a sequence 
of the micro-instruction codes are successively 
read out from thej-ead only memory unitB, and are io 
supplied to the controlling unit 10. In accordance 
with the micro-instruction code sequence, the con- 
trolling unit 10 selectively produces the controlling 
signals 12 for the executions 62 which are supplied 
to the component units and circuits. is 

Rrst, the controlling unit 10 shifts the busy 
signal BUSY to the active low level, which is re- 
ported to the central processing unit 32 so that the 
central processing unit 32 enters the waiting state. 
The essential part of the micro-Instruction codes is 20 
illustrated in detail in Fig. 8. The operand code a Is 
trans ferred to the operand resistor RQ3 as by step 
MA, and the operand code x is further transfen^ed 
to the operand resistor RQ4"as by step MB. then 
the arithmetic and logic unit 18 multiplies the 25 
operand code a by the operand code a as by steps 
MC to MJ to produce the product. The product of 
the operand codes x and x is transferred from the 
resultant resistor RG5 to the operand resistor RG1 
as by step MK. 30 

For producing the product of the operand 
codes b and y. the operand code b is transfenred 
to the operand resistor RG3 as by" step ML. and 
the operand code y is further transferred to the 
operand resistor RG4 as by step MM. When the 35 
multiplicand and the multiplier are stored in the 
respective operand registers RG3 and RG4, the 
arithmetic and logic unit 18 multiplies the operand 
code b by the operand code y as by steps MN to 
MU to produce the product, and the product of the 4o 
operand codes b and y are transfenred from the 
resultant resistor RG5 to the operand resistor RG2 
as by step MV. 

Thus, the augend and the addend are provided 
into the operand registers RQ1 and RG2. and the 4$ 
arithmetic and logic unit 18 adds the product of the 
operand codes b and y to the product of the 
operand codes a and x to produce the sum as by 
steps AA to AET The sum is ti-ansferred from the 
resultant resistor RG5 to the operand resistor RG1 so 
again as by step AR 

The operand codes c and 2 are then trans- 
ferred to the oper and registers RG3 and RQ4 as 

by steps NA and NB, and tfie arithmetic and logic. 

unit 18 multiplies tiie operand code c by the 55 
operand code z to produce the product as" by steps 
NC to NJ. The product of the operand codes c and 
z is then transferred from the resultant reslstor~RG5 



to the operand resistor RG2 as by step NK, and 
the arithmetic and logic unit 1 8 adds the addend in 
the operand resistor RG2 to the augend in the 
operand resistor RG1 as by steps AG to AK to 
produce the product-sum x , The sum is given by 
the following equation (ax + by + cz), and is 
transferred from the resultant resistor RG5 to the 
resistor x of the internal resistor array 16 as by 
step AL, As will be understood from Fig.7 . the 
aritiimetic -processing unit 2 consumes forty five 
clocks for producing the product-sum x . 

The above mentioned steps MA to AL are 
repeated to produce the product product-sums y' 
and 2 which are^ respectively memorized in the 
registers y' and 2'. Since the forty five clocks are 
consumed to produce each of the product-sums, 
the arithmetic processing unit 2 consumes a hun- 
dred and thirty five clocks for the completion of the 
vector/matrix arithmetic operation. 

Thus, the product-sums x'. y' and z are cal- 
culated as instructed, and the micro-instruction 
code sequence allows the busy signal BUSY to go 
up to the Inactive high level. When the central 
processing unit acknowledges the completion of 
the vector/matrix arithmetic operation through the 
recovery of the busy signal BUSY, the cenfal 
processing unit 32 requests the status code to the 
arithmetic processing unit 2, and the status code is 
analyzed by the central processing unit 32 to see 
whether or not any exception takes place in the 
vector/matrix arithmetic operation. If no exception 
takes place, the product-sums x\ y and z are 
transferred to a certain unit, and the data process- 
ing unit contirms the completion of tiie given task. 

Since four clocks are needed for the commu- 
nication between the central processing unit 32 and 
the arithmetic processing unit 2 for the macro- 
instruction code and the status code, the total 
number of the clocks consumed is a hundred and 
thirty nine. Thus, the data processing system ac- 
cording to the present invention is improved in the 
time period consumed in the vector/matrix 
arithmetic operation in comparison with the prior art 
data processing system. 

The vector/matrix arithmetic operation is de- 
scribed on ttie assumption that all of the operand 
codes have been already memorized in the internal 
resistor array 16, however, if no operand code is 
stored in tiie arithmetic processing unit 2, the 
operand codes a to i and x to z are supplied from 
tiie outside thereof as illustrated in Rgs. 8A and 
8B. Namely, the vector/matiix arithmetic operation 
starts with the first operand transferring operation 
52 of the first operand j:Me fronri the ^_c^^^ 
cessing unit 32 to the arithmetic processing unit 2 
at time t10. which is followed by tiie second 
operand transferring operation 54. The operand 
transferring operation is repeated a predetermined 
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times for memorizing all of the operand codes a to 
i and x to z in the internal resistor array 16 oflhe 
arithmetic processing unit 2. Each operand trans- 
ferring operation needs four clocl<s. 

When the last operand transferring operation 
56 is completed (at time t11), the central process- 
ing unit 32 transfers the macro-instruction code 
representative of the vector/matrix arithmetic opera- 
tion to the arithmetic processing unit 2 as indicated 
by broken line 58. The macro-instruction code is 
decoded by the decoder unit 6, and the starting 
address of the micro-program is specified by the 
decoded signal DS, then the busy signal BUSY 
goes down to the active low level at time t12. Thus, 
the central processing unit 32 is forced to enter the 
waiting state, and the vector/matrix arithmetic op- 
eration is carried out in accordance with the se- 
quence shown in Rg.7 . 

When the vector/matrix arithmetic operation is 
completed, the busy signal BUSY is recovered to 
the inactive high level at time t13. The central 
processing unit 32 requests the status code to the 
arithmetic processing unit 32. and the status code 
is supplied to the central processing unit 32 as 
indicated by broken line 64. The central processing 
unit checi<s the status code to see whether or not 
any exception takes place in the vector/matrix 
arithmetic operation. If no exception takes place, 
the central processing unit 32 requests the 
arithmetic processing unit to transfer all of the 
product-sums x'. y' and z'. Then, the arith metic 
processing unit 32 successively transfers ail of the 
product-sums to the central processing unit 32 as 
indicated by broken lines 66 to 68. The successive 
receiving operation on the operand codes and suc- 
cessive transferring operation on the product-sums 
are instructed by the central processing unit with 
macro-instruction codes, and the read only mem- 
ory unit 8 stores the corresponding microprograms. 

Thus, it Is necessary for the data processing 
system to transfer the operand codes as well as 
the product-sums t)etween the central processing 
unit 32 and the arithmetic processing unit 2, so that 
an additional time period is consumed for the 
transferring operations. For example, if only two 
operand codes are transferred from the central 
processing unit 32 to the arithmetic processing unit 
2 and only one calculation result is transferred in 
the opposite direction, the total nunnber of the 
additional clocks are twelve, and. for this reason, 
sixteen clocks are consumed for the communica- 
tion between the central processing unit 32 and the 
arithmetic processing unit 2, 

However, if the vector/matrix arithmetic opera- 
tion is carried on a 3^ x 3 matrix a nd a vector with 
three scalar numbers, the additional time period is 
calculated as 

4 (clocks) X 15 (times) = 80 The totaJ number of 



the clocks for the communication is 84. 

If the prior art data processing unit carries out 
the arith metic operation on the 3 x 3 matrix and 
the vector with three scalar numbers, the arithmetic 

5 processing unit performs nine multiplying oper- 
ations and six adding operations. The total number 
of the clocks consumed for each multiplying opera- 
tion is calculated as 
1 1 (clocks for the calculation) + 

10 16 (clocks for the communication) = 27 (clocks) 
The total number of the clocks consumed for each 
adding operation is given as 
8 (clocks for the calculation) + 
16 (clocks for the communication) = 24 (clocks) 

15 Then, the total number of the clocks consumed for 
the task is calculated as 

2 7(clocks) X 9 (times) + 24 (clocks) x 6 (times) = 
387 (clocks) On the other hand, the data process- 
ing system according to the present invention con- 
20 sumes only 199 clocks for completion of the same 
task as follows: 

1 35 (clocks for the calculation) + 

64 (clocks for the communication) = 199 (clocks) 

Thus, time consumption of the data processing 

25 system according to the present invention merely 
is decreased to a half of that consumed by the 
prior art data processing system, and, for this rea- 
son, the built-in microprograms are effective for 
improvement in the operation speed. 

30 As will be understood from the foregoing de- 

scription, the data processing system is improved 
in the operation speed, because the arithmetic op- 
eration on the vector and the matrix is carried out 
without any interrupt for operand receipt. 

35 The macro-instruction code is representative of 

the arithmetic operation on the vector and the 
matrix, however, another macro-instruction code is 
representative of an arithmetic operation on a plu- 
rality of vectors, and still another macro-instruction 

40 code is representative of an arithmetic operation on 
a plurality of matrices. 

Turning to Figs. 9 and 10 of the drawings, 
essential parts of the arithmetic and logic unit 18 
are illustrated in detail. The essential part shown in 

45 Fig. 9 canries out the adding operation, and another 
essential part shown in Rg. 10 performs the mul- 
tiplying operation. 

The operand register RGI has a single sign bit 
SN1. eight exponential bits EXi and twenty three 

so mantissa bits MTS1, and thirty two bits of the 
operand registor RQ2 are shared by the sign, the 
exponent and mantissa SN2. EX2 and MTS2 as 
similar to the operand registor RGI. Each of the 
operand registers RGI and RG2 thus arranged is 

55 cap abl e of provjding a storage for a number repre- 

sented in the single precision floating point format 
according to the IEEE 754 standard. When each of 
the exponent bits EXI and EX2 is read out from 
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the operand register RGl or RG2. a bit of "0" is 
automatically added next to the highest order bit 
thereof. According to the IEEE 754 standard, the 
most significant bit indicative of "1" is deleted from 
the mantissa bits as the hidden bit and, for this 
reason, the bit of "1" is placed on the next to the 
highest order bit. However, a bit of "0" is further 
placed next to the bit of "1 " upon reading out from 
the operand register RGl or RG2, so that the 
mantissa read out from the operand register RGl 
or RG2 is represented by twenty five bits. The 
exponential bits EX1 and EX2 are supplied to a 
subtracter 100. and the subtracter 100 produces s 
difference represented by nine bits. The mantissa 
bits MTS1 and MTS2 each represented by twenty 
five bits are fed to a subtracter 102. and a dif- 
ference produced therein is also represented by 
twenty five bits. The differences are memorized 
into a nine-bit register 104 and a twenty five bit 
register 106. respectively. 

Reference numeral 108 designates a zero flag 
register, and the zero flag is ANDed with the high- 
est order bit of the difference memorized in the 
register 106. The output of the AND gate 110 is 
ORed with the highest order bit of the difference 
memorized in the register 104, and the output of 
the OR gate 112 is supplied to a flag register 114 
for memorizing a comparative result The binary 
number indicative of the difference is supplied to 
an incrementer 116 through an inverter 118, and 
the outputs of the incrementer 116 and the register 
104 are fed to the a multiplexer 120. and the 
multiplexer 120 is responsive to the highest order 
bit for steering the outputs. The output of the 
multiplexer 122 is memorized in a nine bit register 
122. 

The mantissa bits MTS1 and MTS2 are fed to 
a multiplexer 124, and the multiplexer 124 is re- 
sponsive to comparative result in the flag register 
114 for selectively transferring to a barrel shifter 
126. The barrel shifter 126 is responsive to the 
output of the register 122, and the mantissa bits in 
the barrel shifter 126 is shifted in the right direction 
by a number indicated by the output of the register 
122. The output of the barrel shifter 126 is supplied 
to an incrementer 128 through an inverter 130, and 
is further supplied to a multiplexer 132. The sign 
bits SNi and SN2 are fed to an exclusive-OR gate 
134, and the multiplexer 132 is responsive to the 
output of the exclusive-OR gate 134 for selectively 
transferring to a twenty five bit register 134. The 
mantissa bits MTSI and MTS2 are supplied to a 
multiplexer 136. and the output of the register 114 
steers the -multiplexer -1 36 for transferring one of 
the mantissa bits MTSI and MTS2 to a twenty five 
bit register 138. 

The outputs of the registers 134 and 138 are 
supplied to an adder 140. and the sum is memo- 



rized into a twenty five bit register 142. 

The output of the register 142 is fed to a 
multiplexer 144, and the multiplexer 144 selectively 
transfers the output of the register 142 or an output 
6 from the multiplying section (shown in Fig, 10) to a 
detector 146. The detector counts the bits of "O** 
from the highest order side, and detects the first bit 
of "1" in the bit string supplied from the mul- 
tiplexer 144. If the data bits in the register 142 is 
10 represented by one of the strings shown in the 
leftmost column of Fig. 11. the data in the other 
columns are fed to the shifters 148 and 150 and a 
subtracter 156, respectively. In Fig. 11, the mari< x 
represents either "1 " or "0" bit and the data bits in 

16 the register 142 represent a binary number. How- 
ever, the data fed to the shifter 150 and the sub- 
tracter 156 are represented by hexadecimal num- 
bers, respectively. The exponential bits EX1 and 
EX2 are supplied to a multiplexer 152. and one of 

so the exponential bits EX1 and EX2 is transferred to 
a multiplexer 154 depending upon the output of the 
register 114. The multiplexer 154 is responsive to 
one of the controlling signals 12, and either output 
from the multiplexer 152 or the multiplying section 

26 is transferred to a subtracter 156. The outputs of 
the detector 146 are supplied to shifters 148 and 
150, respectively, in accordance with Fig. 11. The 
shifter 148 shifts the twenty five bits fed from the 
multiplexer 144 by a single bit in the right direction 

30 in the presence of the data bit of "1 " fed from the 
detector 146. On the other hand, the shifter 150 is 
of the barrel shifter, and shifts the twenty five bits 
fed from the shifter 148 by a predetermined num- 
ber of bits in the left direction depending upon the 

35 data bits fed from the detector 146. The subtracter 
156 subtracts a value represented by the data bits 
from the detector 146 from the value represented 
by the data bits fed from the multiplexer 154. A 
multiplexer 158 Is supplied with the sign bits SNI 

40 and SN2, and is responsive to the output of the 
register 114 for transfen^ing one of the sign bits 
SN1 and SN2 to a multiplexer 160. The multiplexer 
160 is responsive to aforesaid controlling signal, 
and transfers one of the sign bit from the mul- 

46 tiplexer 158 and a sign bit from the multiplying 
section to the resultant register RQ5. The resultant 
register RG5 provides a storage for the sign bit. the 
exponential bits, fed from the subtracter 156 and 
the mantissa bits fed from the shifter 1 50. 

50 As described hereinbefore, the micro-instruc- 
tion codes ADDI to ADDS are executed to achieve 
the adding operation, and the micro-instruction 
codes ADDI to ADDS are respectively related to 
the behaviors in the sections indicated by ADDI to 

55 ADDS. For better understanding of the circuit be- 
haviors, description is made for the circuit be- 
haviors achieved by execution of the micro-instruc- 
tion codes ADDI to ADDS. 
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When the micro-instmction code ADD1 is de- 
coded by the controlling unit 10 to produce a part 
of the controlling signals, the subtracter 100 sub- 
tracts the value represented by the exponential bits 
EX2 from the value represented by the exponential 
bits EX1, and the difference is memorized in the 
register 104. If the difference has a negative value, 
the most significant bit In the registor 104 is "1". 
However, when the subtracter 100 produces zero, 
the zero flag Is set In the registor 108 so as to 
indicate no difference. Similarly, the subtracter 102 
subtracts the value represented by the mantissa 
bits MTS 2 from the value indicated by the man- 
tissa bits MTSI, and the difference Is memorized in 
the registor 106. tf the difference is a negative 
value, the most significant bit in the registor 106 is 
"1". Thus, the operand 0PA1 in the form of the 
floating point number is compared with the operand 
0PA2 also In the form of the fioating point number 
in the execution of the micro-instruction code 
ADD1. 

When the second micro-instruction code ADD2 
is decoded, a second part of the controlling signals 
allows the zero flag to be anded with the most 
significant bit in the registor 106 and the output of 
the AND gate 110 to be ORed with the most 
significant bit in the registor 104, The output of the 
OR gate 112 is memorized in the registor 114 as 
the comparative result flag. The comparative result 
flag represents that the floating point number In the 
operand registor RGl is smaller than the fioating 
point number in the operand registor RG2. The 
second part of the controlling signals permits the 
inverter 118 to produce the complementary bits of 
the data bits memorized In the registor 104. and 
the incrementer 116 add "1" to the value repre- 
sented by the complementary bits fed from the 
Inverter 118. If the most significant bit in the re- 
gistor 104 is "0", the multiplexer 120 is transparent 
to the output of the registor 104, however, if, on ttie 
other hand, the most significant bit is "1". the 
mutttplexer 120 transfers the output of the in- 
crementer 116 to the registor 122. Then, the data 
bits stored in the registor 122 is indicative of the 
absolute value of the number Indicated by the data 
bits in the registor 104, The value indicated by the 
data bits in the registor 122 is the difference be- 
tween the exponential bits EX1 and EX2, and. for 
this reason, the value memorized in the registor 
122 is used for a scale factor. 

The third part of the controlling signals are 
produced by decoding the third micro-instruction 
code ADDS, and causes the multiplexer 124 to 
supply the mantissa bits MTSI to the barret shifter 
126 in the presence of the comparative result flag 
of "1 ** but the manTissa bite MTS2 if the compara- 
tive result flag is "0". In other words, the mul- 
tiplexer 124 provides either mantissa bits with the 



absolute value smaller than that of tfie other man- 
tissa bits to the barrel shifter 126. The barrel shifter 
126 shifts the bits fed from the multiplexer 124 in 
the right direction depending upon the value repre- 
5 sented by the bits in the registor 122, so that the 
value represented by the bits In the registor 122 
serves as the scale factor. The output of the barrel 
shifter 126 Is supplied to the inverter 130, and the 
complementary bits thereof are fed from the barrel 

10 shifter 126 to the incrementer 128 for producing 
the complement on two. The sign bits SN1 is 
exciusiye-ORed with the sign bit SN2. and the 
output of the excIusive-OR gate 134 controls the 
multiplexer 132, Namely, if the output of the 

75 exclusive-OR gate 134 is "0", the multiplexer 132 
is transparent to the output of the barrel shifter 126. 
but the multiplexer 132 allows the complement on 
two to pass therethrough with the output of the 
exclusive-OR gate 134 of "1". This results in that 

20 the mantissa bits with the smaller absolute value is 
supplied to the registor 134 in the co-presence of 
the identical sign bits SN1 and SN2, however, if the 
sign bits SNI and SN2 are different from each 
other, the registor 134 memorizes the twos com- 

25 plement of the mantissa bits with the smaller ab- 
solute value. When the comparative result flag Is 
"0", the mantissa bits MTSI is memorized in the 
registor 138. however, the mantissa bits MTS2 Is 
memorized In the registor 138 in the presence of 

30 the comparative result flag of "1". Thus, the selec- 
tion of the mantissa bits by the multiplexer 136 is 
opposite to the selection by the multiplexer 132, 
and the registor 138 memorizes either mantissa 
bits with the absolute value larger than that of the 

35 other mantissa bits. 

The micro-instruction code AD04 is causative 
of the fourth part of \he controlling signals 1 2, and 
the fourth part causes the adder 140 to add the 
outputs of the registers 134 and 138 to each other. 

40 The sum is memorized in the registor 142 for the 
transfenring operation. 

The micro-instruction code ADD5 is decoded 
to produce the fifth part of the controlling signals 
12 which is used for nor malization. Namely, the 

4B multiplexer 144 becomes transparent to the output 
of the registor 142, and, for this reason, the sum is 
transferred to the detector 146. The detector 146 
counts the number of the bits of "0" on the higher 
orders, and detects the first bit of "1 Then, the 

so detector 146 supplies the bit and the bits to the 
shifter 148. the shifter 150 and the subtracter 156 
in accordance with Fig. 1 1 for which the description 
has been made hereinbefore. The multiplexer 152 
becomes transparent to either exponential bits with 

55 the absolute value larger than that of the other 
e)qponentiar bits ^ependir»g upon the comparative 
result flag. The multiplexer 1 54 transfers the output 
of the multiplexer 162 to the subtracter 156, be- 
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cause the micro-instruction code ADDS is indicative 
of the adding operation. The subtracter 156 carries 
out the subtracting operation on the exponential 
bits and the bits fed from the detector 146 for the 
normaJIzation. The output of the subtracter 146 
consists of nine bits, however, the lower eight bits 
are memorized in the exponential part of the resul- 
tant register 162, because the most significant bit 
is temporally added to the bit string for Indicating 
the positive number or the negative number. The 
shifter 148 shifts the bit string by a single bit in the 
right direction depending upon the decision of the 
detector 146 for the normalization. On the other 
hand, the barrel shifter ISO shifts the bit string fed 
from the shifter 148 in the left direction depending 
upon the bits fed from the detector 146 for the 
normalization. The output of the barrel shifter 150 
consists of twenty five bits, but only the twenty 
three bits are memorized in the mantissa part of 
the resultant register 162. because the most signifi- 
cant bit is temporarily added to the bit string for 
the representation of the negative or positive num- 
ber. The bit next to the most significant bit is also 
deleted from the bit string according to the IEEE 
754 standard. The multiplexer 158 is responsive to 
the comparative result flag, and becomes transpar- 
ent to either sign bit fed from the operand registers 
RG1 and RG2. The sign bit SN1 or SN2 is trans- 
ferred to the multiplexer 160, and the multiplexer 
160 in turn transfers the sign bit SN1 or SN2 to the 
sign part of the resultant register 162, because of 
the adding operation. Consequently, the sign bit 
the exponential bits and the mantissa bits are 
memorized in the resultant register 162, and the 
adding operation is thus completed by memorizing 
the calculation result in the resultant register 1 62. 

The multiplying section shown in Fig. 10 is 
accompanied with the operand registers RG3 and 
RG4. and the operand resistors RG3 and RG4 
memorize sign bits SN3 and SN4, exponential bits 
EX3 and EX4 and mantissa bits MTS3 and MTS4, 
respectively. Each of the exponential parts EX3 or 
EX4 consists of eight bit but a single bit is added 
to the bit string upon reading out form the ex- 
ponential part- Each of the mantissa parts MTS3 or 
MTS4 consists of twenty three bits, so that each 
operand is in the form of the single precision 
floating point format in accordance with the IEEE 7 
54 standard. Since the IEEE 7 54 standard re- 
quests that the most significant bit be deleted from 
the bit string as a hidden bit. the hidden bit and 
one more bit are added to the mantissa bits upon 
reading out from the mantissa part of the operand 

registor RQ3 or RG4. 

In Fig. 10. reference numeral 200 designates 
an exclusive-OR gate, and the output of the 
exclusive-OR gate 200 is representative of the sign 
of the product. The sign bit is fed from the 



exclusive-OR gate 200 to the multiplexer 160. Ref- 
erence numerals 202 and 204 denote respective 
adders for calculating the exponential parts, and 
the adders 202 and 204 perfonm the adding oper- 
5 ations on the nine bit data for producing nine bit 
sums, respectively. An inverter is denoted by refer- 
ence numeral 206 for producing the complement 
bits of all the mantissa bits MTS4, Reference nu- 
meral 208 designates an incrementer for the twenty 

70 five complementary bits, and reference numeral 
210 designates a nine bit registor. Reference nu- 
merals 212, 214, 216, 218. 220. 222, 224, 226 and 
228 respectively designate twenty five bit registers, 
and multiplexers are denoted by reference numer- 

15 als 230. 232. 234. 236. 238 and 240, respectively. 
Adders 242. 244, 246, 248. 250 and 252 are pro- 
vided for the mantissa parts, and respectively carry 
out adding operations on the twenty five bit data. 
The multiplying operation is achieved by ex- 

20 ecution of the seven micro-instruction codes MUL1 
to MUL8. so that the sequence is divided into eight 
stages. When the first micro-instruction code MUL1 
is decoded by the controlling unit 10. the first part 
of the controlling signals are produced for the mul- 

25 tiplying operation. Namely, the first part of the 
controlling signals is used for the calculations for 
the sign bits SN3 and SN4 and the exponential bits 
EX3 and EX4 as well as for the production of 
multiples of the Booth's multiplication algorithm. In 

30 this instance, the second order Booth's multiplica- 
tion algorithm is employed in the multiplying sec- 
tion. According to the Booth's multiplication al- 
gorithm, the multiplier is divided into twelve three- 
bit sections each overlapped with the adjacent see- 
as tions by one bit With the three bit sections, +2 
times the multiplicand, +1 time the multiplicand, 
zero time the multiplicand, -1 time the multiplicand 
and -2 times the multiplicand are selectively added 
together for achievement of the multiplication. For 

40 this reason, the multiples are previously produced 
from the operand memorized in the operand re- 
gistor RG4. Then, let us trace the sequence with 
reference to Rg. 10. First, the sign bits SN3 and 
SN4 are supplied to the exclusive-OR gate 200, 

46 and the output of the exclusive-OR gate indicative 
of tiie sign of the product is transferred to the 
multiplexer 160. The exponential bits EX3 is added 
to the exponential bits EX4 by the adder 202. and 
the sum is memorized in tiie registor 210. Since 

60 the bit of "0" is previous ly added to the bit string 
of the exponential part EX3 or EX4 upon reading 
out from tiie operand registor RG3 or RG4, no 
overflow takes place In the adder 202 and. accord- 

Ingly. the registor 21 0^_The mantissa bits MTS4_ is 

55 inverted by the inverter 206, then the complement 
bit string is transferred to the incrementer 208. 
Since tiie mantissa part MTS4 consists of the origi- 
nal twenty three bits, the hidden bit of " V and the 
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extension bit of "0". the component bit string also 
consists of the twenty five bits. The complement bit 
string is incremented by one, and, for this reason, 
the output of the incrementer 208 is of the com- 
plement on two. The two's complement is memo- 
rized in the register 216. The two's complement is 
shifted by one bit in the right direction, and is, 
then, memorized in the register 218. This results in 
that the registers 216 and 218 respectively retain 
-1 time the value indicated by the mantissa bits 
MTS4 and -1/2 time the value indicated by the 
mantissa bits MTS4, The register 216 provides the 
mantissa bits MTS4 to the register 212, and man- 
tissa part MTS4 is shifted by one bit in the right 
direction. The bit string thus shifted is memorized 
in the register 214. so that +1 time the value 
indicated by the mantissa bits MTS4 and +1/2 
time the value indicated by the mantissa bits MTS4 
are respectively retained in the registers 212 and 
214. The arithmetic shifts are earned out through 
coupling wires, so that no shifter is provided in 
association with the registers 212 to 218. 

With the second part of the controlling signals 
produced by decoding the micro-instruction code 
MUL2, the multiplying section starts the multiplica- 
tion of the lower twelve bits of the mantissa bits 
MTS3 and the mantissa bits MTS4. As will be seen 
from Fig, 12. the mantissa bits MTS3 are divided 
into twelve sections SECl. SEC2, SEC3. SEC4, 
SEC5, SEC8. SECT. SEC8. SEC9. SEC10, SEC11 
and SECl 2 which are transferred to the multiplex- 
ers 230 to 240. By the way, mi (where i is 0, .... 
and 22) are component bits of the mantissa part 
MTS4. The seventh section SEC consisting of the 
component bits mn, mio and mg is transferred to 
the multiplexer 230. and the multiplexer 232 is 
supplied with the eighth section SEC8 consisting of 
the component bits mg. ma and m?, the ninth 
section SEC9 consisting of the component bits mr. 
me and ms being transferred to the multiplexer 
234, the tenth section SEC10 consisting of the 
component bits ms, m* and ms being transferred 
to the multiplexer 236. the eleventh section SECl 1 
consisting of the component bits ma. m? and mi 
being transferred to the multiplexer 238. the last 
section SECl 2 consisting of the component bits 
mi, mo and an extension bit of "0" being trans- 
ferred to the multiplexer 240. The sections SECl to 
SEC6 are transferred to the respective muttiptexer 
in tine execution of the micro-instruction code 
MUL4, and the sections SEC 7 to 12 are used In 
the selections under the micro-instruction code 
MUL2. Each of the multiplexers 230. 232. 234/ 236. 
238 or 240 is coupled at the input nodes thereof to 
the registers 212. 214. 216 and 218. and the bit 
"^0" is also supplied to one of the input riode«. 
Each of the multiplexers transfers the bits of the 
selected register or the bit "0" depending upon the 



bit pattern of the section supplied thereto. Each of 
the outputs of the multiplexers is shifted by two 
bits upon transferring to the adder 242. 244 or 246. 
The outputs of the multiplexers 230 to 240 are 

5 added together by the associated adders 242. 246 
and 248, respectively, and the sums are respec- 
tively memorized in the registers 220, 222 and 224, 
The subtracter 204 subtracts a constant value 
(7FH) from the sum memorized in the register 210, 

10 and the constant value ( 7FH) is the bias value for 
the exponential part of the single precision floating 
point format defined by the IEEE 7 54 standard. 
Both of the exponential parts EX3 and EX4 contain 
the bias value, and the value indicated by the bits 

75 in the register 210 contains twice the bias value 
after the adding operation carried out by the adder 
202. In order to normalize the exponential part, the 
subtracter 810 deletes one of the bias values, and 
the difference is transferred to the adding section. 

20 When the tfiird micro-instruction code MUL3 is 

decoded, the third part of the controlling signals 12 
allows the bit string read out form the register 224 
to be shifted by four bits in the right direction 
through a coupling wire. The value indicated by the 

25 bits memorized in the register 222 is added to the 
value indicated by the bits in the register 224. and 
the sum is trans fenred from the adder 248 to the 
register 226 for memorization. 

The fourth micro-instruction code MUL4 is de- 

30 coded by the controlling unit 10, and the fourth part 
of the decoded controlling signals allows the mul- 
tiplication of the lower twelve bits of the mantissa 
part MTS3 to be completed and the higher twelve 
bits to enter the sequence in a pipeline fashion. 

35 The bit string memorized in the register 226 is 
shifted by four bits in the right direction, and the 
value indicated by the bits thus shifted by four bits 
is added to the value Indicated by the bits in tiie 
register 220. The sum is transfenred from the adder 

40 250 to the register 228, and is memorized therein. 
Thus, the lower twelve bits of the mantissa part 
MTS3 is multiplied by the mantissa part MTS4. and 
the result Is stored in the register 228. 

Simultaneously, tiie higher twelve bits of the 

45 nnantissa MTS3 is treated through the simitar se- 
quence controlled upon execution of ttie micro- 
instruction code MUL2. Namely, as shown in Fig. 
1 2. ttie first section SEC1 consisting of an exten- 
sion bit of "1" and the component bits maa and 

50 m2i is supplied to the multiplexer 230. and the 
multiplexer 232 is supplied with the second section 
SEC2 consisting of the component bits m2u mso 
and mi 9, then the third section SEC3 consisting of 
the component bits mi 3. mi a and mi7 being trans- 

66 fenred to the multiplexer 234, tiien the fourth sec- 
tion SEC4~cohsisting~df the cdmpdnent bits mi 7. 
nni6 and mis being transferred to the multiplexer 
236. In the similar manner, the fifth section SEC5 is 
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transferred to the multiplexer 238. and the mul- 
tiplexer 240 Is supplied with the sixth section 
SEC6. The multiplexer thus supplied with the re- 
spective sections SECI to SEC6 becomes trans- 
parent to one of the bit groups or the bit of "0" s 
depending upon the bit pattern as shown in Rg. 13. 
Each of the bit strings from the multiplexers 232, 
236 and 240 is shifted by two bits through a 
coupling wiring, and the bit strings fed from the 
multiplexers 230 to 240 ^re added by the asso- w 
ciated adders 242, 244 and 248, and the sums are 
memorized in the registers 220, 222 and 224, re- 
spectively. 

The fifth micro-instruction code MUL5 is also 
causative of two different behaviors in a pipeline is 
fashion. Namely, the bits in the register 228 are 
added to the bits in the register 230, and the sum 
is memorized In the register 230. Since the register 
230 initially stores the bit string consisting of the 
bits of "0". the register 230 memorizes the same 20 
bit string upon completion of the first addition. The 
second behavior is similar to that caused by the 
third micro-instruction code MUL3. Namely, the bit 
string in the register 224 is shifted by four bits in 
the right direction, and the bits thus shifted by four 25 
bits are added to the bits in the register 222. The 
sum is memorized in the register 226. 

The sixth micro-instruction code MUL6 is de- 
coded to produce the sixth part of the controlling 
signals 12. The sixth part of the controlling signals 30 
12 is causative of the behavior for the upper twelve 
bits of the mantissa bits MST3 similar to that 
produced by the fourth part of the controlling sig- 
nals 12, Namely, the bit string in the register 226 is 
shifted by four bits in the right direction, and the 35 
bits thus shifted in the right direction are added to 
the bits from the register 220. The sum is trans- 
ferred from the adder 250 to the register 228. and 
is memorized in the register 228 as the product of 
the higher twelve bits of the mantissa part MTS3 4o 
and the mantissa part MTS4. 

When the seventh micro-instruction code 
MUL7 is decoded by the controlling unit 10, the 
seventh part of the controlling signals 12 takes 
place to complete the multiplying operation. Name- 45 
ly, since the register 230 retains the product of the 
lower twelve bits of the mantissa part MTS3 and 
the mantissa bits MTS4, the bit string indicative of 
the product is shifted by twelve bits in the right 
direction, and, then, the bits thus shifted are added 50 
to the bits fed from the register 228 to produce the 
product of the mantissa parts MTS3 and MTS4. 
The final product is stored in the register 230* 

The final microHnstruction code MUL8 is used 

for the normalization as similar to the adding opera- 55 
tion. Since the multiplying operation is carried out 
for the normalized numt»ers, the final product is 
also normalized numtDer, or the final product be 



normalized in the right direction by one bit only. 
This normalization is carried out by the detector 
146, the shifter 148 and the subtracter 156. Name- 
ly, the multiplexer 144 becomes trans parent to the 
bits from the register 230. and the bits from the 
registor 230 is transferred to the detector 146. The 
detector 146 counts the number of the bits of "0" 
from the highest order side, and detects the first bit 
"1" for producing the bit and bits in accordance 
with Fig. 1 1 . The bit and the bits are supplied to 
the shifter 148, the barrel shifter 150 and the sub- 
tracter 156. The multiplexer 154 is transparent to 
the bits from the subtracter 204, because of the 
multiplying operation, and the difference from the 
subtracter 204 is transferred to the subtracter 156. 
The subtracter 156 subtracts the value indicated by 
the data bits fed from the detector 146 from the 
difference so as to normalize the exponential part. 
The normalized exponential part consists of nine 
bits, but the lower eight bits are memorized in the 
exponential part of the resultant registor 162. be- 
cause the most significant bit was added for the 
sake of indicating a positive or negative number. 
The shifter 148 shifts the bit string in the right 
direction by one bit, if necessary for the normaliza- 
tion. However, no left shifting is needed, so that the 
lower twenty three bits pass through the barrel 
shifter 150 and are memorized in the mantissa part 
of the resultant registor 162, The most significant 
bit was added to the mantissa bits for indicating a 
negative or positive number, and, for this reason, 
the most significant bit is deleted from the bit 
strong. The hidden bit is also deleted from the 
mantissa bits in accordance with the IEEE 754 
standard. The multiplexer 160 transfers the sign bit 
from the exclusive-OR gate 200 to the sign part of 
the resultant registor 162. Thus, the multiplying 
operation is completed by the final micro-instruc- 
tion code MUL8. 

Although particular embodiments of the present 
invention have been shown and described, it will be 
obvious to those skilled in the art that various 
changes and modifications may be made without 
departing from the spirit and scope of the present 
invention. 



Clalnns 

1. An arithmetic processing unit provided in 
association with a central processing unit having a 
plurality of instruction codes including a plurality of 
macro-instruction codes respectively representative 
of arithmetic operations on scalar numbers, on_a_ 
vector and a matrix, on a plurality of vectors and on 
a plurality of matrices, characterized by the com- 
bination of 

a) a program memory unit storing a plurality 



12 



of microprograms including microprogranns corre- 
sponding to said nnacro-instructions representative 
of the arithmetic operations; 

b) an Instruction decoder unit supplied with 
one of the macro-instruction codes and producing 
a decoded signal Indicative of a starting address of 
one of said microprograms corresponding to said 
one of the macro-instructions for successively 
reading out a micro-instruction code sequence 
from said program memory; 

c) a controlling unit responsive to the micro- 
instruction code sequence and operative to pro- 
duce a plurality of controlling signals and to shift a 
busy signal between an active level and an inactive 
level so as to cause the central processing unit to 
enter a waiting state and to be recovered there- 
from; 

d) an internal resistor array having a plurality 
of registers for memorizing a plurality of operand 
codes, sums, products and product-sums; 

e) a resultant resistor for storing one of said 
sum of said operand codes, one of said product of 
said operand codes and one of said product-sums; 

f) a plurality of operand reglstors partially 
assigned to said operand codes serving as an 
augend and an addend and partially assigned to 
said operand codes serving as a multiplicand and a 
multiplier; 

g) an arithmetic and logic unit responsive to 
said controlling signals and operative to perform at 
least arithmetic operations on the operand codes in 
said operand registers for producing one of said 
sums, one of said products and one of said 
product-sums in said resultant resistor; and 

h) a data input-and-output port communica- 
ble with external units including said central pro- 
cessing unit for receiving said operand codes an 
for transferring one of said sums, one of said 
products and said product-sums. 
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